Asymmetric cryptography in I2P has always led to slowdowns: the Diffie-Hellman algorithm when establishing transport sessions and, in my opinion, the unfortunate choice of the ElGamal scheme in I2P addresses. This is especially noticeable when working on weak hardware and floodfills. The approach proposed in the article is based on the use of some features of I2P and allows you to achieve significant acceleration and reduce the load on the processor.
Currently, asymmetric encryption is based on the operation of exponentiation modulo and the hypothesis that discrete logarithms are practically impossible. There are plans to use other types of asymmetric encryption, but they have not yet been implemented.
The public key y is calculated based on the private key x using the formula y=g^x mod p, where p is a prime, and in the general case a triple of numbers (y, p, g) is passed as the key. In I2P, the numbers p and g are constants and must be the same for all implementations:
g = 2,
p = 2^2048 - 2^1984 - 1 + 2^64 * ([pi*2^1918] + 124476),
where pi denotes the number pi, and [] is the operation of taking the integer part of a number.
The length of p is 256 bytes, respectively, the length of the public key.
According to official documentation secret key length is 2048 bits for x64 and 226 bits for all others.
The use of calculated tables was previously discussed for EdDSA signatures.
Let us briefly recall the essence. An addition operation is defined over the curve points and it is required, using this operation, to multiply the constant base point by a 32-byte number. To do this, we will add byte by byte, taking the result of multiplying a point by the value of each byte from the table, calculated once at startup. This will require exactly 32 additions, while the calculation time becomes constant, which is extremely important for multiplying with a secret key.
We will do the same for the operation of raising g to a power modulo p, only instead of the addition operation we will use the multiplication operation.
Let's calculate a table of all possible values of powers of p for each byte, filling in an array of 255 elements for each byte, multiplying in order by g=2.
You will notice that the first elements of the table do not need to be stored, since they are obtained by setting a bit in the corresponding position, however, the value should not exceed p, in fact this corresponds to 2048 = 11 bits, that is, only one and a third of the table bytes.
Thus, for a private key with a length of 226 bits, the table size is 29*255*255 ~ 2 megabytes, for a key with a length of 2048, the table size will be 256*255*255 ~ 16 megabytes, which can be a significant amount, but in this case the effect is obtained more significant.
Since each exponentiation requires at least 29 multiplications modulo, it is advisable to use Montgomery multiplication.
Its meaning comes down to replacing one module with another, more convenient module for calculations. For practical needs, a power of two is chosen and then slow division is replaced by a fast bit shift.
The disadvantage is the need to convert to the Montgomery representation. However, in our case, this is done only once at the time of calculation of the table and the only additional action is to convert the result.
For practical purposes, we do not need to implement Montgomery multiplication - the corresponding functions are provided in OpenSSL.
specifies the module (in our case p)
Montgomery multiplication itself in the context with a given modulus
converting the result to normal representation.
Since release 2.7.0 supported and controlled by parameter precomputation.elgamal, by default it is disabled for x64 and enabled for others. It is recommended to enable it to work in floodfill mode if there are no strict restrictions on memory use. In this case, the processor load is reduced by more than 2 times due to the need to frequently establish connections with other nodes.
It is used for generating key pairs when establishing connections, asymmetric encryption when building tunnels, and “garlic” encryption between addresses. Similarly, DSA signing can be accelerated, but in I2P it is considered obsolete and is gradually being replaced by EdDSA.
The approach discussed in the article may seem trivial, but its implementation has shown its effectiveness, making it possible to work on those platforms that previously lacked performance.
Asymmetric encryption in I2P
Currently, asymmetric encryption is based on the operation of exponentiation modulo and the hypothesis that discrete logarithms are practically impossible. There are plans to use other types of asymmetric encryption, but they have not yet been implemented.
The public key y is calculated based on the private key x using the formula y=g^x mod p, where p is a prime, and in the general case a triple of numbers (y, p, g) is passed as the key. In I2P, the numbers p and g are constants and must be the same for all implementations:
g = 2,
p = 2^2048 - 2^1984 - 1 + 2^64 * ([pi*2^1918] + 124476),
where pi denotes the number pi, and [] is the operation of taking the integer part of a number.
The length of p is 256 bytes, respectively, the length of the public key.
According to official documentation secret key length is 2048 bits for x64 and 226 bits for all others.
Calculated tables
The use of calculated tables was previously discussed for EdDSA signatures.
Let us briefly recall the essence. An addition operation is defined over the curve points and it is required, using this operation, to multiply the constant base point by a 32-byte number. To do this, we will add byte by byte, taking the result of multiplying a point by the value of each byte from the table, calculated once at startup. This will require exactly 32 additions, while the calculation time becomes constant, which is extremely important for multiplying with a secret key.
We will do the same for the operation of raising g to a power modulo p, only instead of the addition operation we will use the multiplication operation.
Let's calculate a table of all possible values of powers of p for each byte, filling in an array of 255 elements for each byte, multiplying in order by g=2.
You will notice that the first elements of the table do not need to be stored, since they are obtained by setting a bit in the corresponding position, however, the value should not exceed p, in fact this corresponds to 2048 = 11 bits, that is, only one and a third of the table bytes.
Thus, for a private key with a length of 226 bits, the table size is 29*255*255 ~ 2 megabytes, for a key with a length of 2048, the table size will be 256*255*255 ~ 16 megabytes, which can be a significant amount, but in this case the effect is obtained more significant.
Montgomery Multiplication
Since each exponentiation requires at least 29 multiplications modulo, it is advisable to use Montgomery multiplication.
Its meaning comes down to replacing one module with another, more convenient module for calculations. For practical needs, a power of two is chosen and then slow division is replaced by a fast bit shift.
The disadvantage is the need to convert to the Montgomery representation. However, in our case, this is done only once at the time of calculation of the table and the only additional action is to convert the result.
For practical purposes, we do not need to implement Montgomery multiplication - the corresponding functions are provided in OpenSSL.
int BN_MONT_CTX_set(BN_MONT_CTX *mont, const BIGNUM *m, BN_CTX *ctx);
specifies the module (in our case p)
int BN_mod_mul_montgomery(BIGNUM *r, BIGNUM *a, BIGNUM *b, BN_MONT_CTX *mont, BN_CTX *ctx);
Montgomery multiplication itself in the context with a given modulus
int BN_from_montgomery(BIGNUM *r, BIGNUM *a, BN_MONT_CTX *mont, BN_CTX *ctx);
converting the result to normal representation.
Implementation in i2pd
Since release 2.7.0 supported and controlled by parameter precomputation.elgamal, by default it is disabled for x64 and enabled for others. It is recommended to enable it to work in floodfill mode if there are no strict restrictions on memory use. In this case, the processor load is reduced by more than 2 times due to the need to frequently establish connections with other nodes.
It is used for generating key pairs when establishing connections, asymmetric encryption when building tunnels, and “garlic” encryption between addresses. Similarly, DSA signing can be accelerated, but in I2P it is considered obsolete and is gradually being replaced by EdDSA.
The approach discussed in the article may seem trivial, but its implementation has shown its effectiveness, making it possible to work on those platforms that previously lacked performance.